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EXECUTIVE  SUMMARY 


This  report  reviews  current  fusion  techniques  used  for  bathymetry  or  other  geospatial  data,  as 
motivated  by  the  Naval  Oceanographic  Office’s  (NAVOCEANO)  need  for  intelligent  fusion  -  combining 
two  or  more  data  sets  in  a  manner  that  accounts  for  data  uncertainty  -  of  gridded  and  in  situ  bathymetric 
data  sets.  Currently,  NAVOCEANO’s  bathymetry  database,  DBDB-V,  incoiporates  the  Feathering 
Algorithm  to  smooth  discontinuities  that  occur  between  tiles  in  the  database  with  dissimilar  resolutions. 
This  technique,  however,  still  leaves  artifacts  and  does  not  provide  uncertainty  estimates.  NAVOCEANO 
needs  intelligent  fusion  capabilities  not  only  to  fuse  data  sets  in  a  manner  that  takes  the  uncertainty  of  the 
data  into  account,  but  also  to  generate  products  that  meet  hydrography  standards  and  to  provide  the 
capability  to  use  one  database  for  multiple  purposes.  A  review  of  the  technical  literature  indicates  that 
current  state-of-the  art  fusion  techniques  that  have  been  used  on  bathymetric  data  include  splines-in- 
tension  interpolation,  locally  weighted  regression  (loess),  and  kriging.  In  addition,  there  are  new 
techniques  based  on  Bayesian  inference,  but  these  appear  to  require  further  development  before  being 
ready  for  operational  implementation.  Based  on  this  review,  we  recommend  an  approach  for  building  new 
bathymetry  fusion  algorithms  that  was  published  and  validated  for  bathymetry  data  recently  by  Calder*. 
This  approach  uses  both  loess  interpolation  to  obtain  a  trend  surface,  followed  by  kriging  of  residuals  to 
recapture  finer  details  lost  from  smoothing.  In  addition,  if  in  situ  soundings  are  used,  Monte  Carlo 
simulations  are  run  to  estimate  depth  error  induced  by  position  errors.  The  technique  also  provides  the 
means  to  liberally  estimate  errors  for  navigation  safety.  The  Naval  Research  Laboratory  (NRL)  plans  to 
build  and  validate  a  fusion  algorithm  based  on  this  approach.  The  work  leverages  other  NRL  efforts  that 
developed  data  fusion  capabilities  using  loess  interpolation  and  adds  in  additional  required  components  in 
the  build  schedule.  The  algorithm  will  support  the  new  Bathymetric  Attributed  Grid  (BAG)  format  of  the 
Open  Navigation  Surface  Project.  NRL  plans  to  transition  the  software  to  the  Naval  Oceanographic 
Office,  Bathymetry  Database  Division. 
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ALGORITHM  DESIGN  STUDY  FOR  BATHYMETRY  FUSION  -  REVIEW  OF 
CURRENT  STATE-OF-THE-ART  AND  RECOMMENDED  DESIGN  APPROACH 


1.  INTRODUCTION 

The  Digital  Bathymetry  Data  Base  -  Variable  Resolution  (DBDB-V)  is  the  bathymetric  database 
maintained  by  the  Naval  Oceanographic  Office  (NAVOCEANO)  for  worldwide  bathymetry  data.  The 
latest  release,  Version  5.2,  has  global  coverage  at  2  minutes  of  resolution,  primarily  based  on  satellite 
altimetry  data  of  the  ocean  surface  (which  then  provides  bathymetry  from  the  inferred  gravitational 
anomalies)  and  ship  soundings  [1].  Higher  resolution  data  in  selected  areas  also  are  contained  in  the 
database  in  resolutions  of  1,  0.5,  0.1,  and  0.05  [2]  arc  minutes.  DBDB-V  stores  these  data  sets  in 
rectangular  grid  tiles. 

When  the  end  user  extracts  data  for  an  area  covered  by  multiple  tiles  with  different  resolutions  using 
nearest  neighbor  interpolation  or  bicubic  spline  interpolation,  there  are  discontinuity  artifacts  at  tile 
boundaries.  These  discontinuities  may  be  caused  by  differences  in  spatial  frequencies  or  data  accuracy 
across  tile  boundaries  or  both.  As  a  result,  as  discussed  in  Steed  and  Rankin  [3],  false  cliffs  appear  in  the 
extracted  data  at  tile  boundaries  (Fig.  1).  The  artifacts  have  negative  visual  impact  and  cause  errors  in 
oceanographic  and  acoustic  modeling.  To  mitigate  these  errors,  the  OAML  Feathering  Algorithm  [3]  was 
implemented  in  DBDB-V  Version  5.2  to  smooth  discontinuities  at  the  boundaries.  The  algorithm 
produces  a  minimum- curvature  spline  grid  for  the  extracted  area  using  the  “mbzgrid”  C  function  of  the 
MB-System  software  suite  [4]  (open-source  software  used  for  processing  and  visualizing  bathymetric  and 
acoustic  backscatter  data).  As  shown  in  Steed  and  Rankin  [3],  the  algorithm  smoothes  discontinuities 
between  dissimilar  tiles  and  should  reduce  errors  for  modeling  applications.  Artifacts,  although  smoothed, 
remain.  For  example,  some  false  cliffs  become  ramps  in  between  transition  regions  (Fig.  2).  The  OAMF 
Feathering  Algorithm,  however,  was  not  intended  for  permanent  use;  it  was  a  “stop-gap  measure”  to  be 
used  until  intelligent  fusion  algorithms  could  be  created. 

Intelligent  fusion  differs  fundamentally  from  feathering  in  that  the  uncertainty  of  the  data  points  is 
taken  into  account  when  generating  an  interpolation  surface  from  different  data  sets.  The  inclusion  of 
uncertainty  (or  error)  allows  for  the  use  of  different  techniques  that  weight  the  data  using  the  errors  and 
provides  error  estimates  with  the  bathymetry.  Hence,  the  products  of  intelligent  fusion  are  a  bathymetry 
layer  and  an  uncertainty  layer. 


Manuscript  approved  March  1 1,  2008. 
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Fig.  1  -  Artifact  discontinuities  between  DBDB-V  Version  3.0  tiles  of  dissimilar 
resolutions  (5.0  minutes  in  coarse  region,  0.5  minutes  in  finer  region)  [3], 
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(c)  Depth  Profile  Comparison 


(d)  DBDB-V  Resolution  Source  Map 


(a)  DBDB-V  4.1  Output  (bicubic  spline) 


(b)  Feathered  DBDB-V  Output 
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5  minute 


Fig.  2  -  Example  improvement  of  artifact  discontinuities  using  the  OAML 
Feathering  Algorithm  on  tiles  from  Fig.  1  [3], 


This  result  is  desirable  for  a  number  of  reasons.  First,  data  points  with  smaller  errors  are  more 
important  to  surface  generation.  In  an  era  where  highly  accurate  multibeam  echo  sounder  systems  are 
available,  it  is  desirable  to  give  more  credence  to  these  data  than  to  data  collected  in  an  era  with  vertical 
beam  echo  sounders  (or  older  technology)  and  higher  uncertainty  in  navigation.  Second,  specification  of 
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uncertainty  allows  for  multiple  use  of  a  bathymetry  database,  so  that  navigation  charts  can  be  generated 
from  the  same  database  (by  shoaling  the  data  with  an  appropriate  uncertainty  level)  that  would  be  used  by 
modelers.  Third,  modelers  can,  in  principle,  provide  error  estimates  of  predictions  based  on  propagation 
of  bathymetry  errors  through  the  models.  Lastly,  uncertainty  in  the  interpolated  points  is  required  for 
International  Hydrography  Organization  (IHO)  standards  [5].  Section  5.3  of  the  fifth  edition  (draft) 
dictates: 

All  data  should  be  attributed  with  a  95%  statistical  error  estimate  for  position  and  depth  where 
appropriate.  For  soundings  this  should  preferably  be  done  for  each  individual  sounding;  however 
a  single  error  estimate  may  be  recorded  for  a  number  of  soundings  or  even  for  an  area,  provided 
any  difference  between  the  individual  error  estimates  can  be  safely  expected  to  be  negligible. 

Quantifying  uncertainty  in  sparse  geospatial  data  sets  has  become  a  focus  of  research  within  the  past 
5  to  7  years.  Increased  computational  capabilities  have  made  it  possible  to  collect  high  density  and  high 
volume  data  sets,  which  allows  for  error  estimation  based  on  statistical  techniques.  Also,  navigational  and 
sonar  uncertainties  have  become  lower,  resulting  in  higher  precision  in  the  data  obtained.  Both  of  these 
factors  and  others  are  used  by  a  new  data  processing  algorithm,  the  Combined  Uncertainty  and 
Bathymetric  Estimator  (CUBE)  for  providing  robust  estimates  of  bathymetry  and  uncertainty  [6].  With 
regard  to  product  generation,  traditional  interpolation  techniques  for  sparse  data  sets  have  become  more 
pragmatic  with  newer,  more  efficient  algorithms  and  increased  computational  speed  to  allow  for  the  use 
of  newer  robust  estimations  with  more  computationally  expensive  procedures. 

In  response  to  the  IHO  requirements  and  newer  error  estimation  capabilities,  a  new  data  format,  the 
Bathymetric  Attributed  Grid  (BAG)  [7],  has  been  designed  by  the  Open  Navigation  Surface  Working 
Group,  a  consortium  of  academic,  government  and  private  sector  groups.  Central  to  the  BAG’S  design  is 
the  requirement  that  it  hold  both  bathymetry  and  uncertainty  data  for  rectangular  bathymetric  tiles.  This 
format  has  now  been  adopted  by  two  commercial  software  suites  used  for  bathymetric  data  processing 
and  analysis,  and  is  now  being  used  by  NAVOCEANO  to  store  output  from  new  multibeam  data  sets 
processed  by  CUBE.  Thus,  the  adoption  of  intelligent  procedures  for  fusing  bathymetry  data  sets  is  now 
not  only  a  pragmatic  possibility  but  also  a  necessity. 

The  purpose  of  this  report  is  to  provide  a  starting  point  for  providing  NAVOCEANO  with  intelligent 
fusion  availabilities  for  current  and  future  bathymetry  data  sets.  In  Section  2,  we  first  review  the  current 
state-of-the-art  techniques  for  intelligent  fusion  of  sparse  data  sets,  some  of  which  were  developed  for 
bathymetry  data.  This  section  not  only  serves  as  a  review  but  also  provides  background  for  describing  this 
reports  recommended  technique,  which  is  a  new  three-step  approach  by  Calder  [8].  Section  3  reviews  this 
approach.  Section  4  provides  discussion  of  software  build  design  and  case  studies  to  use  for  validation. 
Finally,  Section  5  provides  summary  remarks. 
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2.  REVIEW  OF  CURRENT  STATE-OF-THE-ART 

As  discussed  in  Davis  [9]  and  reiterated  by  Smith  and  Wessel  [10],  algorithms  that  generate  gridded 
surfaces  from  sparse  data  sets  assume  the  following: 

a)  Grid  points  are  single- valued. 

b)  Continuity  exists  everywhere  on  the  generated  surface. 

c)  Autocorrelation  of  the  surface  is  positive  for  distance  greater  than  the  node  spacing. 

These  assumptions  should  be  maintained  by  intelligent  fusion  algorithms.  In  order  to  satisfy  condition 
one,  the  point  values,  which  will  now  be  a  statistical  quantity  with  an  estimated  uncertainty,  can  be  a 
mean  or  median. 

Crain  [11]  and  Franke  [12]  provide  reviews  of  earlier  interpolation  techniques.  These  methods 
include  polynomial  interpolation,  inverse-distance  weighting,  triangulation,  piecewise  contour  line 
segments,  simple  finite  difference  and  finite  element  methods.  These  techniques  are  not  reviewed  again 
here.  The  newer  techniques  discussed  below  have  either  been  applied  to  bathymetry  or  have  some  form  of 
robust  uncertainty  estimation,  or  both.  We  first  discuss  “splines-in-tension,”  which  is  used  for  the  OAML 
Feathering  Algorithm  and  has  been  applied  to  bathymetry  data  since  the  1970s;  uncertainty  estimation 
was  added-on  in  2002.  Next,  are  reviews  of  loess  interpolation  and  kriging;  both  fundamentally  provide 
uncertainty  estimation.  Finally,  we  briefly  preview  possible  future  techniques  based  on  Bayesian 
inference. 

2.1  “Splines-in-Tension”  Interpolation 

Spline  interpolation  constructs  curves  between  data  points  to  generate  a  gridded  surface.  “Splines-in- 
tension”  is  a  specific  technique  that  calculates  a  global  solution  by  solving  a  differential  equation. 
Constraints  often  include  data  points,  some  desired  mathematical  property  (e.g.,  continuous  first  and 
second  derivates,  minimum  curvature,  etc.),  and  boundary  conditions  at  the  end  points.  The  gridded 
surface  is  then  calculated  by  finite  differences. 

2.1.1  Methodology > 

The  geospatial  community  uses  techniques  by  Briggs  [13]  and  Smith  and  Wessel  [10]  to  calculate  the 
interpolation  surface.  The  elasticity  equation  for  thin  plates  derived  by  Love  [14]  is  the  basis  for 
generating  the  interpolation  surface,  which  is  modeled  as  an  elastic  plate  that  is  constrained  at  the 
locations  of  the  data  points.  Under  this  model,  the  differential  equation  to  be  solved  is  [10] 

(1  -  r)V3(v3z)-  7V3z  =  '£flS(x-x„y-yl),  (1) 

i 


where  T  is  the  tension;  z  is  the  two-dimensional  interpolation  surface;  f  =  z(xt,  y,),  the  /,h  known  value  for 
z  at  coordinate  (jc,-,  y,)  as  (x,  y)  — >  (x„  y,);  and  S[x  -  x„  y  -  y,)  is  the  Dirac  delta  function.  In  Briggs’s 
“minimum-curvature”  technique,  T=  0  so  that  Eq.  (1)  becomes 


dAz  dAz  d4z 
dx4  dx2dy2  dy 4 


£fAx-x,.,y 


y) 


(2) 
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The  solution  to  the  surface  is  one  where  sum  of  the  curvature  of  the  surface  is  minimized  (as  the  potential 
energy  of  the  plate  is  minimized  in  this  case),  and  found  by  imposing  the  conditions  of  continuous  first 
and  second  derivatives,  known  fixed  values  at  the  data  points  and  constraints  at  the  boundaries.  Briggs 
provides  a  finite  difference  solution  to  Eq.  (2),  which  was  later  coded  into  Fortran-IV  by  Swain  [15].  This 
minimum-curvature  method  was  used  for  earlier  versions  of  DBDB  [16]. 

Smith  and  Wessel  [10]  recognized  that  the  solution  for  Eq.  (2)  can  produce  false  minima  and  maxima. 
They  corrected  for  this  potential  for  artifacts  by  adding  the  tension  term  back  into  the  equations  to  be 
solved  and  provided  a  solution  for  Eq.  (1)  through  finite-difference  iteration. 

2.1.2  Available  Software 

Smith  and  Wessel’s  algorithm  is  coded  in  the  SURFACE  function  in  the  Generic  Mapping  Tools 
(GMT)  [17]  software  suite.  A  similar  splines-in-tension  algorithm  is  in  the  MBZGRID  in  the  MB- 
System  [5]  suite.  As  both  are  widely  used  open-source  software  packages. 

2.1.3  Advantages  and  Disadvantages 

An  advantage  to  the  technique  is  that  it  is  relatively  fast  and  is  exact  at  the  points  where  there  is  data. 
Also,  a  solution  is  found  globally.  A  disadvantage  is  that  the  setting  for  the  tension  term  is  arbitrary.  This 
tension  may  need  to  be  set  to  different  values  in  different  settings.  For  this  reason,  the  OAML  Feathering 
Algorithm  sets  the  tension  to  the  value  recommended  by  the  authors  of  the  code.  An  additional 
disadvantage  is  that  these  routines,  as  noted  by  Smith  and  Wessel,  do  not  provide  error  estimates  for  the 
interpolated  grid  points. 

2.1.4  Error  Estimation  from  Monte  Carlo  Simulation 

Jakobsson  et.  al.  [18]  developed  a  Monte  Carlo  technique  for  error  estimation  for  spline-in-tension 
interpolation.  The  technique  uses  positional  accuracies  of  the  ships  that  recorded  the  soundings  to 
estimate  a  Gaussian  probability  density  function  for  horizontal  positioning  error.  Using  a  subset  of 
bathymetric  data  from  the  International  Bathymetric  Chart  of  the  Arctic  Ocean  (IBCAO)  [19]  for 
validation,  the  horizontal  positions  of  the  soundings  are  randomly  perturbed  and  new  interpolated  surfaces 
are  generated.  Since  the  constraining  positions  are  different  for  each  iteration  of  the  simulation,  the 
calculated  solution  to  Eq.  (1)  changes  at  the  interpolation  points  as  well.  After  the  simulation  is  complete, 
an  uncertainty  layer  is  then  created  from  the  standard  deviations  of  the  solutions  for  each  grid  point.  This 
error  estimation  technique  is  discussed  in  Section  3  for  uncertainty  estimation  induced  by  horizontal 
positioning  error. 
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2.2  Linear  Smoothing  by  Locally  Weighted  Regression  (Loess) 

Linear  smoothing  techniques  construct  the  interpolated  surface,  z(s),  S  =  matrix  of  (x,y) 
coordinates,  from  a  linear  weighted  average  of  known  data  values.  Unlike  the  splines-in-tension  technique 
that  finds  a  global  solution  to  the  available  data,  this  technique  obtains  an  interpolated  value  at  a  grid 
point  by  using  only  a  subset  of  neighboring  points.  Mathematically, 

*(s)=2>(s (3) 


where  a(s  -  s,)  specifies  the  smoother  coefficients  and  the  index  i  corresponds  to  the  subset  of  local 
points,  z(S/),  to  be  used  for  the  interpolated  value  at  s,  z  (s).  A  common  technique  used  to  specify  these 
coefficients  is  locally  weighted  regression  (or  “loess”),  first  published  in  Cleveland  [20]  and  further 
developed  in  Cleveland  and  Devlin  [21].  A  recent  textbook  by  Givens  and  Hoeting  [22]  also  discusses  the 
technique.  The  methodology  determines  the  smoother  coefficients  from  a  weighted  least  squares 
polynomial  (linear  or  quadratic)  fit  of  windowed  data. 

2.2.1  Methodology > 

To  summarize  Cleveland’s  methodology,  the  two-dimensional  case  is  considered.  (We  extend  these 
equations  to  the  three-dimensional  case  in  Section  3;  the  methodology  remains  the  same.)  Let  data  points 
x,  and  v,  be  related  as 


yi=g{xi)+£i,  (4) 

where  g(x)  is  a  smooth  function  and  is  Gaussian  noise  with  mean  0  and  variance  a2.  Thus  we  can 
define j>(. to  be  the  estimate  ofg(x,)  (i.e.,  y.  ~  g(xi)).  Let 


•  the  number  n  be  the  predetermined  number  of  data  points  to  be  used  for  estimating  y. 

•  the  set**,  k=  1 he  the  subset  ofx/s  (J  =  1  N=  total  number  of  data  points,  n  <  N)  that 
are  closest  to  x, 

•  the  distance  /;,  be  the  distance  from  x,  to  the  furthest  x/, 

•  the  windowing  weights  to  be  used  for  the  regression,  wk(x,)  =  W([xk  -  x, ]//?,),  where  W(x)  is  the 
tricube  function,  defined  as 


W{x)=l^  H  1’  X<1  (5) 

[0  ,  otherwise 

With  these  definitions,  the  loess  procedure  calculates  the  set  of  polynomial  coefficients,  /^(x,.),  which  are 
the  values  for  the  ’s  that  minimize 

q(x.  )=Yjwk  (x,.  )(yk  -  A,  -  f\xk - pdxdk  f  ,  (6) 

k=\ 


where  q(x,)  is  the  error  function.  The  interpolated  value  for  g(x,)  is  then  [20] 
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yt  =  YjPdxiYi  = 


£=0 


k= 1 


(7) 


where  the  smoothing  terms  on  the  right-hand  side  correspond  to  the  right-hand  side  of  Eq.  (3)  rewritten  to 
the  present  two-dimensional  case.  Some  points  of  interest  are  as  follows. 

•  The  coefficients  obtained  have  a  decreasing  trend  to  the  edge  of  the  window  so  that  the  data 
centered  in  the  window  generally  have  the  greatest  influence. 

•  The  size  of  the  polynomial  typically  used  in  Eq.  (6)  is  d  =  1  or  2.  For  d  =  0,  the  result  is  a  simple 
moving  average.  The  d  =  1  case  is  called  linear  loess  or  “lowess”  smoothing,  and  d=  2  is 
quadratic  “loess”  smoothing. 

•  Cubic  and  higher  fits  typically  are  not  used  as  the  fits  can  become  over  fitted  and  numerically 
unstable  [23]. 

•  This  technique  also  has  a  robustness  option,  so  that  the  interpolation  can  be  shielded  from  the 
effects  of  outliers  in  the  data  (details  in  Cleveland  [20]). 

•  Cleveland  choose  the  tricube  weighting  function  because  it  allowed  the  estimate  of  the  error 
variance  to  be  approximated  by  a  chi-square  distribution  and  usually  lowered  the  variance  of  the 
estimate  surface  as  the  number  of  points  used  for  the  estimate  increased. 

Errors  propagated  into  the  interpolation  by  the  technique  are  straight  forward  to  compute.  Under  the 
assumption  the  data  follow  Eq.  (4),  the  estimate  of  the  variance  for  y. ,  a] ,  is  [20,  21,  24] 


I  kb,)]2 


k=\ 


(8) 


which  is  derivable  from  independent  error  propagation. 

2.2.2  Equivalent  Kernel  Approach 

Figure  3  plots  the  set  of  afs  as  determined  for  a  centered  impulse  response  using  the  linear  and 
quadratic  loess  interpolators  [25].  These  coefficients  are  the  smoother  weights  in  Eq.  (7)  and  are  also 
called  the  “equivalent  kernel.”  They  depend  only  on  the  grid  points,  and,  due  to  their  finite  width,  act  as  a 
window  or  low-pass  filter  function  on  the  spatial  data  (i.e.,  the  smoothing  weights  and  the  data  undergo 
convolution).  Elence,  one  could  bypass  solving  a  weighted  least  square  problem  and  simply  compute  the 
convolution  of  the  smoothing  window  with  the  data,  which  should  be  computationally  faster.  In  addition, 
other  windowing  functions  could  be  used  to  perform  the  smoothing. 


Elmore  and  Steed 


Fig.  3  -  Equivalent  kernel  weights  for  linear  and  quadratic  loess  windows  for  window  size  of  0.5 


The  use  of  other  weighting  functions  would  also  allow  one  to  interpolate  over  data  points  with 
differing  accuracy.  In  this  case,  Eqs.  (7)  and  (8)  become 


y,  =  Z  °k  (t  {  °k2l  Z  ak2  V,  =  Z  (■*,•  )y 

V  /  k=\ 


k=  1 


k  9 


(9) 


k=l 


°2  =  Z 

k= 1 

(  /  n 

where  o\  is  the  variance  ofyk  and  ak  {xj )  =  ak  (x; )  <Jk  VS 


k= i  y 


(10) 


2.2.3  Available  Software 

Cleveland  et  al.  [26]  have  written  publicly  available  Fortran  and  C  code  for  loess  interpolation.  The 
technique  is  also  part  of  the  publicly  available  LOCFIT  software  package  [27].  The  commercially 
available  Curve  Fitting  Toolbox  for  MATFAB  [28]  also  contains  loess  interpolation  routines. 
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2.2.4  Advantages  and  Disadvantages 

An  advantage  of  this  procedure  is  that  it  does  not  require  an  a  priori  model  or  other  function  to  be 
specified;  thus,  it  is  good  for  modeling  surfaces  that  have  complicated  topologies  [23].  A  disadvantage  of 
the  loess  technique  is  that  it  is  not  exact  at  the  data  points  and  the  interpolated  bathymetry  data  can  be 
subject  to  spatial  aliasing,  particularly  when  the  data  is  not  uniformly  sampled  or  when  there  are  features 
not  resolved  by  the  sonar  system  [24].  As  a  result,  additional  variance  and  bias  occurs. 

Schlax  and  Chelton  [29]  examined  the  aliasing  issue  for  equivalent  kernels  reviewed  by  Buja  et  al. 
[30],  which  included  running  means,  linear  interpolation,  Gaussian  windows  and  cubic  splines  in  addition 
to  linear  and  quadratic  loess  kernels.  When  the  data  points  are  evenly  sampled,  the  transmitted  errors  are 
lowest  for  the  quadratic  loess  kernel  when  interpolation  occurs  far  from  grid  boundaries.  Sparse  sampling 
induces  more  errors  due  to  aliasing  by  a  factor  of  between  4  and  8,  regardless  of  the  kernel  used,  with  the 
quadratic  loess  showing  the  most  increase  in  error.  Nonetheless,  in  Schlax  and  Chelton’ s  examples,  the 
quadratic  loess  kernel  still  did  the  best  for  all  but  one  (linear  interpolation)  of  the  kernels.  Plant  et  al.  [24] 
point  out  that  the  increased  sensitivity  is  due  primarily  to  the  negative  sidelobes  in  the  weights. 

2.3  Kriging 

Kriging  is  a  standard  technique  for  interpolating  sparse  geospatial  data.  Geospatial  data  analysis 
textbooks,  such  as  Davis  [9]  and  O’Sullivan  and  Unwin  [31],  and  the  monograph  by  Joumel  [32]  present 
the  technique.  Cressie  [33]  and  Chiles  [34]  provide  advanced  mathematical  treatments.  The  method  was 
first  introduced  by  Krige  [35]  and  developed  by  Matheron  [36].  Like  the  linear  smoothing  technique 
discussed  above,  it  is  a  linear  regression  technique  where  interpolated  values  are  estimated  from  a 
weighted  sum  of  neighboring  data  points;  however,  the  methodology  for  finding  the  weights  relies  on 
solving  a  system  of  simultaneous  linear  equations  (instead  of  minimizing  an  overdetermined  system) 
whose  terms  are  the  covariances  or,  alternatively,  variograms  of  the  data.  Thus,  before  discussing  the 
relevant  kriging  equations  for  bathymetry,  we  first  introduce  covariance  and  variograms. 

2. 3. 1  Covariance  and  Variogram  Matrices 

The  covariance,  in  terms  of  bathymetry  measurements  at  S(  and  S  . ,  z(S( )  andz(S;)  respectively, 
is  [33] 


cov(S,.,Sy)  =  ^{z(S,)z(S/)}-£{z(S/)}£{z(S/)} 
where  the  operator  E{}  is  the  expectation  value  of  the  enclosed  term.  In  matrix  form 


(11) 


COv(si5Sj) 

COv(Sj,S2)  • 

••  covfo.sjl 

c  = 

cov(s2,s,) 

cov(s2,s2)  • 

••  cov(s2,s„) 

(12) 

_cov(sf!,s,) 

c°v(s„,s2)  • 

••  cov(s„,s„)J 

Note  that  for  i  =_/,cov(Sf,S  •)  is  the  variance  ofz(Si)  ,  var(z(S;))  =  cr?(s  ( .  When  z(s)  is  statistically 
stationary,  then  the  covariance  simply  depends  on  the  displacement  h  from  s  (i.e.,  the  variance  is 
independent  of  position  so  that£'{z(s  +  h)}=  E{z{s)\).  Thus  for  stationary  conditions  [37], 
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cov(h)  =  £’{z(s  +  h)z(s)}-  [ft  (z(s)}]2 . 


(13) 


The  variogram,  2 y,  is  [33] 


2/(s, ,  Sj )  s  var(z(s,. )  -  z( s , )) 

=  var(z(s;.))+  var(z(s/))-2cov(si.,s/) 


(14) 


The  semivariance  is  half  the  variogram,  or  simply  y.  Under  stationary  conditions,  Eq.  (14)  becomes  [37], 

/(h)  =  cov(o)  -  cov  (h),  (15) 


where  c  o v  (0 )  =  var  (z  (s )} . 

Often,  the  semivariance  is  of  more  practical  use  than  the  covariance  because  an  estimate  for  the 
semivariance  is  straightforward  to  compute  from  observations,  particularly  along  one  dimension. 
Consider  n  such  observations  made  at  regular  intervals.  From  Eq.  (14),  the  empirical  estimate  of  the 
semivariance,  /(s( ,  S  . )  «  /(s,. ,  S  . ),  for  displacement  h,  y(sj ,  Si+h )  =  f{h),  is 


y(h)  =  X(z(  s(.)  -  z(Si+hjf /in  . 

i 


(16) 


Qualitatively,  the  semivariogram  (Fig.  4),  the  plot  of  the  semivariance,  generally  increases  with  h  from  a 
minimum  at  h  =  0,  which  may  or  may  not  be  zero  ( y( O)  y  0  is  called  the  “nugget  effect”),  to  a  horizontal 
line,  the  “sill,”  which  is  equal  to  overall  variance  of  the  observations.  This  qualitative  trend  holds  when 
the  data  is  a  regionalized  variable.  A  regionalized  variable  can  be  considered  as  a  hybrid  between 
deterministic  and  random  variables,  where  a  high  amount  of  correlation  exists  between  two  samples  that 
are  taken  close  to  each  other  spatially,  but  this  correlation  degenerates  as  the  spatial  distance  increases 
until  a  distance,  the  “range,”  is  reached  where  further  observations  appear  to  be  random.  Thus,  the 
variogram  remains  at  the  sill  for  values  of  h  greater  than  the  range  as  any  correlation  between  data  points 
is  lost. 
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Fig.  4  -  Example  semivariogram  with  no  nugget  effect  and  the  range  and  sill  noted.  At  the  sill,  the  semivariogram  equals  the 
overall  variance  of  the  observations.  This  semivariogram  is  a  plot  of  the  spherical  model,  Eq.  (44)  below,  with  a0  =  0,  a\  =  1.08, 
and  a2  =  0.7.  Adapted  from  Davis  [9], 


This  qualitative  behavior  makes  it  convenient  to  fit  experimental  variograms  to  modeled  ones.  We 
will  define  one  such  model,  called  the  “spherical”  model,  in  Section  3.2  below.  Other  models  include 
those  based  on  linear,  exponential,  or  Gaussian  functions.  Cressie  [33]  and  Davis  [9]  present  these  and 
other  models.  Once  the  variogram  is  modeled  and  fitted  to  the  data,  the  modeled  semivariance,  or  its 
translation  to  a  covariance,  is  then  used  to  solve  the  kriging  equations,  which  is  the  focus  of  discussion  in 
the  next  few  sections. 

2.3.2  General  Kriging  Equation 

All  kriging  methodologies  used  for  the  interpolation  of  two-dimensional  surfaces  have  a  similar 
general  set  of  linear  equations  [37]: 


z(s)-£{z(s)}=  ^,[z(s,)-£{z(sj}].  (17) 

i=i 

The  weights,  T„  are  found  by  minimizing  the  square  of  the  difference  between  the  two  sides  of  Eq.  (17). 
Depending  on  assumptions  that  can  be  made  about  the  expectation  values,  different  sets  of  equations  for 
solving  the  weights  (or  types  of  kriging)  result.  We  consider  three  types  of  kriging  below:  simple  kriging 
(SK),  ordinary  kriging  (OK),  and  universal  kriging  (UK).  (Other  forms  of  kriging  -  such  as  block  kriging, 
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cokriging,  nonlinear  kriging  -  are  outside  the  scope  of  this  review.)  These  three  types  of  kriging 
progressively  become  more  complicated  (and  with  larger  variance)  as  fewer  assumptions  about  the 
interpolated  surface  are  applicable.  Although  SK  may  not  be  as  applicable  to  bathymetry  estimation  as  the 
other  two,  its  discussion  facilitates  the  review  of  OK  and  UK.  The  reviews  below  are  adapted  primarily 
from  Davis  [9]  but  augmented  with  other  material  from  Cressie  [33]  and  Deutsch  and  Joumel  [37]. 

2.3.3  Simple  Kriging  (SK) 

In  SK,  the  mean  of  the  data,  m,  is  known  and  a  constant.  Hence,  fi{z(s)}  =  m  for  all  s  in  Eq.  (17).  In 
addition,  covariances  and  variances  are  assumed  to  be  independent  of  position.  These  assumptions  imply 
that  Eq.  (17)  reduces  to 


(  n  A  n 

zsAs)=m  ]~YA<  +ZvW. 

V  1=1  )  i= 1 


(18) 


where  the  SK  subscript  means  simple  kriging  estimate.  The  optimal  set  of  weights  at  position  s  is  found 
from  the  simultaneous  solutions  to  the  equations  (see  Joumel  [32]  for  derivation) 


«  .  . 

cov(s, ,S1 )  =  cov(s,  Sj ) 

1=1 

n 

cov(s/  s2)=  cov(s,  s2) 


1=1 


(19) 


n  .  v 

Zyl/Cov(s.sJ=cov(s,sJ 


or  in  matrix  format  CA  SK  =  B  SK ,  where 


A SK  ~ 

Bsk  =[cov(s,s1),---,cov(s,sh)]7 


(20) 


and  T  means  matrix  transposition. 

Thus,  /\SK  =  C~'Bsa:  .  Defining  Y SK  =  [z (s ,  ) -  m  ,...  ,  z(s  n  ) -  m  ]r  ,  Eq.  (18)  becomes 

=  m  ^sk^sk  =  m  XskC  .  (21) 

Equation  (21)  is  the  solution  for  the  interpolation  surface  at  s.  The  estimated  variance  is 

Ak  (So  )  =  var(z(s0  ))  -  B7a  C  'B,,  .  (22) 

2.3.4  Ordinary  Kriging  (OK) 

In  OK,  the  assumption  that  E\z{s)\  =  m  is  maintained,  but  now  m  is  unknown.  Hence,  the  system  of 
simultaneous  equations  needs  an  additional  constraining  condition  to  obtain  a  solution.  This  condition  is 
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that  the  sum  of  the  weights  is  one:  £  "=1  /£,-  =  1  ■  Then,  Eq.  (18)  becomes  (with  the  SK  subscript  replaced 
with  OK  for  ordinary  kriging  estimate) 

fOA'(s)  =  ZV(s,)-  (23) 

i= 1 

The  set  of  equations  now  needing  to  be  solved  are  equivalent  if  either  the  covariance  or  semivariance  are 
used  (see  Joumel  [32]  or  Cressie  [33]  for  derivation). 

n  n 

^AI.COv(siS1)+//0  =COv(s,s1)  or  £V(S/,Si)+A)=Hs,S1) 

i= 1  i= 1 

n  n 

^/L,.cov(s,s2)  +  //0  =cov(s,s2)  or  Xw2)+A=HS.S2) 

/= 1  1=1 

i  (24) 

Eyl.cov(s.sJ+//0=cov(s,sJ  or  XV(s;sJ+//0=Hs,sJ 

1=1  i= 1 

n 

=  i 

i=i 

where  juq  is  a  Lagrange  multiplier,  which  is  a  slack  variable  used  to  raise  the  number  of  unknowns  from  n 
to  n+1  for  the  system  on  /?+l  equations.  In  matrix  form,  Eq.  (24)  is  \NA()K  =  BOA, ,  where 

A  OK  =  [^1 

cov(s,,s,)  •••  cov(s„,s,)  ll  rKs^Sj)  •••  Hs,„Si)  1 

:  :  1  :  :  1 
cov(s,,s„)  •••  cov(s„,s„)  1  •••  Hs„,s„)  1 

1  1  1  oj  L  1  1  1  0 

B0A  =[cov(s,s, ),•••, cov(s,sJ, l]7  or  [r(s,s1),---,^(s,s„),l]r 
Thus,  Aok  =  \N  lBOK  .  Then,  defining Y0K  =  [z(Sj ),..., z(sn),0]r  ,  Eq.  (23)  becomes 

^Ox(S)  =  ^OK^OK  =  B  OK-  (26) 

The  variance  estimate  is 

^(s0)  =  var(z(s0))-B^W-'BOA  (27) 

when  the  covariance  is  used  or 


when  the  semivariance  is  used. 


ai(s,)=BLW'BOI 


(28) 
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2.3.5  Universal  Kriging  (UK) 

Going  one  step  further  in  generality  from  OK  is  UK.  In  UK,  E\z{s  )\  =  m(s)  is  no  longer  a  constant 
mean,  but  now  “drifts”;  however,  var{z(s)}  is  still  assumed  constant.  With  these  properties,  UK 
methodology  builds  upon  OK  in  the  following  manner.  First,  note  that  z( s)  can  be  considered  to  be 
composed  of  two  components:  the  drift  component,  m(s),  that  specifies  the  slowly  varying  expectation 
value  and  a  residual  component,  q( s)  =  z(s)  —  m(s),  that  gives  the  difference  between  the  observations 
and  the  trend.  Since  observations  and  the  drift  should  follow  the  same  trend,  the  residuals  should  have  a 
constant  mean  and,  given  a  covariance  matrix  or  semivariance  for  the  residuals,  can  be  found  from  OK. 
Next,  the  drift  is  modeled  as  a  linear  or  quadratic  polynomial  so  that  (after  combining  the  mean  of  the 
residuals  with  «0) 


m 


a0  + 


n 

«o  +  X(«.U  +«2.v,) 

i= 1 

+  «2  V,  +  arf  +  a4y;  +  a5xi  y) 

Z=1 


(29) 


Then,  under  the  assumption  of  Eq.  (29),  the  system  of  equations  z(s )  =  cj(s  )+  m(s )  is  the  OK  equations 
for  the  residuals  expanded  with  extra  Lagrange  multipliers  to  account  for  the  coefficients  of  the  drift 
model.  Note  that  since  the  residuals  are  the  terms  in  the  OK  equations,  the  covariance  and  semivariance 
for  the  residuals  are  now  cov?  (sj ,  S  , )  and  y  (s;. ,  S  . ) . 

Thus,  for  linear  drift,  the  set  of  equations  to  solve  is  (again, cov(?(s;, S- )  andr  (sj,S/)  are 
interchangeable,  but  we  will  only  print  the  semivariance): 


Z4r?(s/,si)+a0  +«1^1  +«2Ti  =/?(s?si) 

i= 1 

n  ,  \ 

ZV<?(S,S2)+«0+«lW  +«2T2  =/?(S5S2.)  (3°) 

i= 1 


ZV,  (»,■»»)  +  «o  +  «i  xn  +  «2t„  =  r?(s,sj 

Z=1 

Z  A  =  i;  =  x;  Z^t,  =  y 

i= 1  i= 1  i= 1 


(The  quadratic  drift  follows  the  same  pattern  with  the  required  extra  terms.)  In  matrix  form,  Eq.  (30) 
becomes  WA(  A,  =  B,/A, ,  where 


Although  we  do  not  use  UK  in  Section  3,  we  describe  it  here  for  the  sake  of  completeness. 
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^UK  [A  ’ '  "  ’  ’  Mo  ’  M\  >  Ml  ] 


>,(81,8,)  ••• 

7?(s„,S 

) 

1 

X, 

y  i 

s,)  ••• 

7„(s„,s, 

) 

1 

Xn 

yn 

1 

1 

0 

0 

0 

U 

Xn 

0 

0 

0 

T, 

yn 

0 

0 

0 

Thus,  Auk  =  W  lBUK .  Defining  matrix  Y(/A,  =  [z(s, ),..., z(s„ ),0,0,0f  ,  the  UK  solution  for 

n 

Z UK  (®)  =  ^ ]  AZ(®i  ) 


(31) 


(32) 


ZUk(S)  ~^UK^UK  ® UK •  (33) 

The  associated  variances  have  the  same  form  as  Eq.  (27)  if  covariances  are  used  or  Eq.  (28)  if 
semivariances  are  used. 

2. 3. 6  Available  Software 

Deutsch  and  Journal  [37]  developed  the  open-source  geostatistical  software  suite,  GSLib  [38],  for 
calculating  variograms,  covariances  and  interpolated  surfaces  by  a  multitude  of  kriging  algorithms, 
including  SK,  OK,  and  UK.  The  software  is  available  in  both  Fortran  77  and  90.  Although  the  user  may 
provide  semivariances,  GSLib  converts  semivariances  to  covariances  before  solving  for  the  kriging 
equations  ( ;/(())  poses  numerical  problems). 

2.3. 7  Advantages  and  Disadvantages 

As  Davis  discusses  (Ref.  [9],  p.  418),  some  strengths  of  kriging  are  as  follows:  1)  the  technique 
provides  exact  interpolation  at  the  data  points,  2)  error  may  be  estimated  for  every  grid  point  on  the 
interpolated  surface,  and  3)  the  error  estimates  are  the  lowest  of  all  linear  estimation  methods.  Pitfalls,  as 
discussed  by  O’Sullivan  and  Unwin  ([31],  pp.  280-1)  include  the  following  points:  1)  the  fit  of  the 
interpolation  to  the  data  depends  on  the  validity  of  the  variogram,  which  is  often  modeled;  2)  kriging  is 
computationally  intensive,  especially  the  part  where  the  inverse  of  the  C  or  W  matrix  is  required,  and  may 
be  subject  to  rounding  errors  that  may  or  may  not  be  significant. 

2.4  Bayesian  Approaches 

The  approaches  discussed  above  appear  to  be  the  main  techniques  for  interpolation,  fusion  and  error 
quantification  which  have  been  devised,  applied  to  and  validated  for  bathymetry  data.  Some  newer 
approaches  to  data  interpolation  involve  the  use  of  Bayesian  inference  and  Monte  Carlo  calculations  to 
obtain  estimates  and  uncertainty  of  geospatial  data  on  a  rectangular  grid.  Implementation  of  these 
methods,  however,  are  judged  to  be  immature  for  operational  use  as  they  do  not  appear  to  have  been 
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validated  for  bathymetry  and/or  require  research  efforts  for  adaptation  to  bathymetry  fusion,  especially  if 
the  fused  product  is  to  be  used  for  a  hydrographic  chart.  They  are  mentioned  to  provide  possible  future 
research  directions. 

De  Oliveira  [39]  and  Hartman  [40]  present  a  Bayesian  inference  approaches  to  obtain  conditional 
probabilities  density  functions  (cdf)  for  geospatial  variables  at  regular  grid  points  given  measurements 
and  the  associated  errors  at  random  locations.  The  interpolated  values  are  then  the  mean  or  median  of  the 
conditional  cdf  and  the  95%  confidence  interval  provides  the  uncertainty.  Both  approaches  model  the 
background  from  which  the  data  is  sampled  as  a  stochastic  field.  (De  Oliveira  models  to  background  as  a 
Gaussian  random  field,  while  Hartman  uses  a  Gaussian  Markov  random  field.)  Bayesian  inference  is  then 
used  to  estimate  parameters  that  specify  these  fields,  given  the  information  contained  in  the  observations. 
The  Bayesian  inference  however,  requires  integration  over  a  highly  dimensionalized  space,  so  the 
integration  is  approximated  from  Markov  chain  Monte  Carlo  calculations. 

Goff  et  al.  [4 1  ]  use  a  posteriori  resampling  to  remove  noise  from  geospatial  data,  including  the  same 
bathymetry  set  discussed  in  Calder  [8].  This  technique  may  be  adaptable  to  regridding  data  at  new  grid 
points.  Developmental  work  is  needed  first  to  adapt  the  algorithm  to  providing  this  capability.  Also,  the 
authors  admit  that  this  approach  is  computationally  expensive  to  implement  because  the  data  surface  that 
is  generated  is  dependent  on  the  order  that  the  data  is  analyzed,  so  the  data  needs  to  be  reanalyzed 
multiple  times  and  an  average  taken. 

3.  RECOMMENDED  APPROACH 

It  is  recommended  that  the  new  fusion  algorithm  be  based  on  the  procedure  given  by  Calder  [8], 
which  uses  a  three-step  approach  to  fuse  bathymetry  data  sets  into  one  surface  and  provide  an  uncertainty 
layer.  For  hydrography,  the  methodology  also  discusses  how  to  apply  the  uncertainty  to  assess  the  order 
of  survey  as  defined  by  the  International  Hydrographic  Organization  [42].  Figure  5  illustrates  the 
adaptation  of  Calder’s  method  for  the  new  algorithms. 
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Fig.  5  -  New  bathymetry  fusion  algorithm.  Data  flow  is  left  to  right  with  z's  and  d s  referring  to  depths  and  uncertainties  for 
either  archived  grids  (top)  or  new  in  situ  data  (bottom).  Other  symbols  (see  text  for  meaning)  and  component  build  schedules  are 
in  the  key.  Adapted  from  Calder  [8]. 


The  steps  of  the  methodology  are  as  follows. 

1.  Following  Plant  et  al.  [24],  interpolate  the  data  with  the  quadratic  loess  interpolation  technique  of 
Cleveland  [20]  to  provide  a  trend  surface  for  the  bathymetry  and  uncertainty  layer. 

2.  Restore  finer  details  smoothed  by  the  interpolation  from  ordinary  kriging  of  the  residuals;  add  the 
errors  associated  with  kriging  to  the  uncertainty  layer  from  Step  1  assuming  statistical 
independence  (i.e.,  the  variances  add).  The  surface  generated  from  kriging  the  residuals  is  the 
residual  surface. 

3.  Estimate  additional  uncertainty  caused  by  positional  errors  from  the  Monte  Carlo  technique  of 
Jakobsson  et  al.  [18],  but  repeat  steps  1  and  2  above  instead  of  using  the  splines-in-tension 
algorithm  at  each  iteration.  As  before,  add  the  estimated  error  to  the  uncertainty  layer  by 
assuming  statistical  independence. 

In  equation  form,  the  final  bathymetry  surface,  Z(s),  is  the  sum  of  the  trend  surface,  //(s),  and  the  residual 
surface,  R(s). 


Z(S)  =  //(S)  +  R(s) 


(34) 
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The  uncertainty  layer,  cr^s),  is 


^z(s)  =  [^(s)+^(s)  +  ^2(s)]1/2  ,  (35) 

where  cy/s)  and  <Jr(s)  are  the  uncertainties  for  the  trend  and  residual  layers,  respectively,  and 
<s(s)  is  the  uncertainty  layer  associated  with  positional  errors. 

This  approach  appears  to  be  the  most  comprehensive,  validated  methodology  for 
interpolating  bathymetry  and  providing  uncertainty  estimates.  Specifically,  it  1)  combines 
accepted  techniques  to  attain  a  fine  detailed  bathymetry  surface  with  uncertainty  estimation.  2) 
was  demonstrated  and  validated  in  [8]  using  two  sets  of  data  for  the  New  Jersey  Atlantic  Shelf - 
vertical  beam  echo-sounder  data  from  the  1970s  and  multibeam  echo-sounder  data  from  the 
1990s.  In  addition,  NRL  already  has  code  for  loess  interpolation  of  bathymetry  data  based  on  the 
work  of  Plant  et  al.  [24],  which  has  now  been  augmented  with  other  weighting  windows  to 
provide  other  interpolation  options  and  possibly  faster  computation.  In  addition,  NRL’s  code  also 
can  correct  data  sets  that  have  different  vertical  offsets  to  alleviate  interpolation  over  false  cliffs. 

3.1  Quadratic  Loess  Interpolation  for  Trend  Surface 

Extending  Eqs.  (5)  through  (8)  to  two  dimensions,  the  equations  for  calculating  the  trend  surface, 
ju(s)  in  Eq.  (34)  are  as  follows: 


//(s)  =  pr(s)p(s), 


(36) 


where  pr(s)s  [x2, y2,xy,x,  v,l]and  vector  (5(s)  =  [/(-(s), .... /?(|(s)]z  being  the  set  of  /(Jsfs  that 
minimize  the  weighted  least  squares 


?(*)=£  wk  (sfc  -  Ao (s)  -  A  (s)  v,  -  p2  (s)xk  -  /?3  (s)x,  v,  -  fi4  (s)  vA2  -  fi5  (s)x2  )2  (37) 

k= 1 


using  the  two-dimensional  tri-cube  weighting  function 


w, 


(»)  = 


1- 


s-s, 


d ,, 


3  A 


s-s, 


dr. 


<  1 


otherwise 


(38) 


and  the  user  provided  value  for  c/0.  Calder’s  paper  suggests  that  do  be  ten  times  the  largest  sample  spacing. 
Information  lost  by  oversmoothing  is  regained  when  the  residuals  are  calculated  in  the  kriging  portion. 

Estimates  for  the  variances  follow  Eq.  (8)  for  like  variances  in  all  the  data  or 
a2  (s)  =  Y!l=\  \ak  ]2  f°r  differing  variances.  The  «k(s)’s  now  have  azimuthal  symmetry  and  the 
same  radial  dependence  as  the  1-D  case. 
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In  addition,  the  estimate  is  made  more  robust  (i.e.,  eliminates  outliers)  by  flagging  estimates  greater 
than  three  Mahalanobis  units  as  “no  data.”  The  Mahalanobis  distance  [43],  M(p,m,C),  is 

M(|j,m,C)  =  (|j-m)rC  ^(iJ-m)  (39) 

where  p(s)  is  the  vector  of  all  estimated  depths  and  m(s)  and  C(s)  are  the  corresponding  mean  depth 
measurements  and  covariances  for  the  windowed  (c.f.  Eq.  (38))  data  sets.  To  intuitively  explain  Eq.  (39), 
suppose  that  the  data  points  are  all  independent  and  have  mean  m  and  variance  cr.  The  C  matrix  is  all 
zeros  except  for  the  diagonal  elements,  which  are  all  cf.  Then,  if  an  estimate  differs  from  its 
corresponding  windowed  mean  by  more  than  three  variances  (i.e.,  (//  —  m)~ / <j2  >3)  ,  that  estimate 
receives  the  “no  data”  mark.  Equation  (39)  generalizes  this  simpler  scenario  to  account  for  covariance 
between  the  data. 

3.2  Kriging  the  Residuals 

Since  loess  interpolation  is  not  exact  at  the  data  points,  residuals,  i?(s,.),  exists  between  actual 
measurements,  z(s;),  and  the  corresponding  estimated  depths  along  trend  surface,  Ju(sj ) ,  such 
that  f?(s; )  =  z(s,. )  —  ju{si ) .  The  residual  surface  in  Eq.  (34)  is  found  from  interpolation  of 
the  R(s/  )  set  using  ordinary  kriging,  as  an  overall  but  constant  unknown  bias  may  exist  in  the 
residuals.  The  variogram  will  likely  have  directional  anisotropy,  so  that  2y(sn  Sj)  =  2y(h,6), 

where  h  is  the  separation  distance  between  the  two  points  and  6  is  the  heading  angle  (clockwise 
from  the  north). 

To  account  for  this  anisotropy,  the  following  semivariance,  yD{sj,  S-),  is  used  and  constructed 

(unconditionally  valid  for  two-dimensions  [44])  in  the  following  manner:  1)  obtain  an  empirical  estimate 
of  the  variogram,  2)  compute  the  average  azimuthal  variogram  from  the  empirical  estimate  to  detect  the 
directions  of  minimum  and  maximum  variation,  3)  fit  the  variograms  along  these  two  axes  to  the 
spherical  model  for  variograms,  and  4)  using  parameters  from  the  fits  to  the  spherical  model  and  the 
direction  for  minimum  variance,  construct  yD{si,  S ;  )  from  Eqs.  (45)  -  (48)  below.  These  steps  are  now 
discussed  in  more  detail. 


Step  1\  Following  Cressie  ([33],  p.69),  the  variogram  for  the  residuals  is  approximated  using  the 
methods-of-moments  (or  classical)  estimator,  in  blocks  sizes  =  2do(c.f.  Eq.  (38)),  so  that 


2y(h,0)~2y(hi,0)  = 


N(hi  )  (aJ>)eN(ht  yvf{0j 


Y.(R(sJ-R(st)f 


(40) 


(Note  that  Eq.  (40)  is  a  more  general  form  of  Eq.  (16).)  The  sets  N(h:)  and  N {dl )  contain  binned 
separation  distances  and  heading  angles  as  defined  by  the  equations 


Ah, 


<d(sa,sb)<h,.+ 


2 


(41) 
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N(0,)=  {(«.*>):  0,  - ^  S  4*„ .*  J S  0,  +  ^} ,  (42) 

where  c/(sa,S6)  and  Z(sa,S/;)  are  the  Euclidian  distance  and  heading  angle  between  sa  and  S* 
and|/V(/2;)|  and  \N{dl  )|  are  the  number  of  bins  in  these  sets.  For  the  data  set  analyzed  in  [8], 
Ahj  =  250m ,  A0:  =  45° ,  hj  =  iAh  +  Ah  /  2  and  0  j  =  /A 0  . 

Step  2\  The  average  azimuthal  variogram,  2 y(o j )  =  | N(hi  )|  *  ^  2 ;/(/;, ,  0 } )  is  then  calculated.  In  [8], 
2y{0j )  had  two  sets  of  maximum  and  minima  (caused  by  trending  ridges),  so  it  could  be  modeled  by  the 
function 

2 f(0j ) *  0.5g0  +  g2  cos(4 nOj  +  fa),  (43) 

which  is  the  second  Fourier  eigenfunction,  Equation  (43)  is  fitted  to  the  data  by  evaluating  the  second 
discrete  Fourier  transform  coefficient.  The  phase  constant,  fa,  is  the  radian  angle  (which  goes 
counterclockwise  from  the  east)  where  the  first  minimum  is  found.  It  is  changed  to  a  heading  angle 
(again,  clockwise  from  the  north),  0m ,  from  the  transformation  6m  =  —  fa/ 2  +  n /2  .  This  angle  is 

obtained  for  each  2 d0  block,  then  loess  interpolated  to  get  0m  (s; )  for  Step  4  below. 

Step  3:  Two  finer-scaled  directional  variograms  were  then  calculated;  one  in  the  6m  direction,  the 
other  in  the  perpendicular  direction,  0^ ;  using  the  methods-of-moments  estimator  with  angle  bins  of 
±  k/2  about  each  direction.  In  this  case,  Ah  =  100m  and  the  data  were  truncated  to  those  within  95%  of 
the  mean  to  reduce  outliers.  These  empirical  variograms  are  fitted  to  the  standard  spherical  variogram 
model  (Cressie  [33],  Eq.  2.3.8;  Davis  [9],  Eq.  4.98) 

0,  h  =  0 

2y(h)=  <  a0  +  al(l.5(h/a2)-0.5(h/a2y\  0<h/a2<l  (44) 

aQ+ax,  hi  a2>\ 

using  the  Levenberg-Marquardt  algorithm  [45,  46]  for  fitting  (i.e.,  solving  for  a(h  a\,  and  a2).  Let  the 
modeled  variogram  along  the  0m  direction  he  2;/0  (/; )  and  the  fitting  constants  be  a0,  alt  and  a2.  Similarly, 

let  the  modeled  variogram  along  the  0^  direction  be 2;/0  (h ) ,  and  the  fitting  constants  bca^  ,  a\  ,  and 
a2  .  When  solving  for  both  sets  of  constants,  the  variograms  are  constrained  to  equal  the  same  sill  at  large 
distances  so  that  a q  =  a0  andaj1  =  al ,  but  a2  4  a2  in  general. 

Step  4:  Define  the  anisotropy  parameter,  ocaniso  =a-,/a2  .  Then,  y n  (s; ,  S  ),  is  evaluated  in  the 
following  manner. 


where 


S y )  =  y0{d'{Sj , S  ■ )), 


(45) 
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<*'(», -»s,)  =  A(6'„,(s,),«a„.j(s, 


A(#„,  (s,  )>  aaniso )  =  R(-  °,n  (s,  ))diag{  1,  aaniso  )R(£„,  (s,  )) , 
and  R(<9„,  (s;  ))  is  the  standard  two-dimensional  rotational  matrix  [47] 


cos^Js,))  sin(6»m(s,))' 
-sin(^(s,))  cos(6>m(s,)) 


(46) 

(47) 


(48) 


For  applications  where  only  the  bathymetry  is  needed,  S/),  may  be  sufficient  for  use.  In 

hydrographic  situations,  however,  where  extra  caution  is  required  for  navigation  safety,  an  additional 
“hydrographic  uncertainty”  variogram  is  added  to  increase  the  total  uncertainty  for  safety  (see  Calder  [8] 
for  details).  This  variogram  is  defined  to  be 


2rH{h)  =  b  0+b,h.  (49) 

The  choice  of  the  coefficients  is  arbitrary,  but  Calder  uses  b0  =0and  b]  =  6.5 1x10  4  in  so  that  the  95% 

confidence  interval  of  the  uncertainty  increases  by  0.05  m  for  every  meter  of  horizontal  separation  in 
soundings.  Thus,  for  hydrography,  the  final  variogram  to  use  is 

2rtoto/(si5Si)=2rn(si  ,Sy)+2^|si  -S,.|)  (50) 

3.3  Monte  Carlo  Estimation  of  Depth  Error  Due  to  Positional  Stability  Errors 

This  last  part  is  of  use  for  fusion  with  soundings  data,  either  with  other  soundings  data  or  a  historical 
grid,  as  the  position  of  the  soundings  contains  errors  (this  part  is  not  applied  when  fusing  historical 
gridded  data  sets).  As  before  (c.f.  Section  2.1),  positioning  errors  result  in  errors  of  depth  estimates  on  the 
interpolation  surface  as  the  position  of  the  soundings  affect  the  interpolated  solution;  the  Monte  Carlo 
technique  of  Jakobsson  et  al.  [18]  is  used  to  estimate  this  error.  At  the  beginning  of  each  iteration,  the 
locations  of  the  soundings  are  perturbed  according  to  a  probability  density  function  appropriate  for  the 
sounding,  often  assumed  to  be  Gaussian  unless  otherwise  known  (c.f.  Fig.  5  in  Calder  [8]  for  a  non- 
Gaussian  example).  Loess  interpolation  and  kriging,  using  the  variogram  from  the  unperturbed  set,  are 
repeated  at  each  iteration.  The  standard  deviation  of  the  solutions  at  each  interpolation  point 
provides  ^(s)  inEq.  (35). 
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4.  ALGORITHM  DEVELOPMENT,  REQUIREMENTS,  AND  TEST  CASES 

NRL  plans  to  develop  a  prototype  build  in  FY07,  followed  by  two  builds  in  FY08  and  FY09,  and  to 
deliver  these  builds  to  the  Bathymetry  Database  Division  of  NAVOCEANO.  In  addition  to  illustrating 
data  flow,  Fig.  5  shows  that  required  components  for  implementing  the  procedure  discussed  in  Section  3 
and  color  codes  them  by  build  schedule.  The  prototype  build.  Build  0,  will  consist  of  NRL’s 
“mergeBathy”  utility  (translated  to  the  C++  programming  language)  with  its  capability  to  loess  interpolate 
in  situ  collected  data  to  one  interpolation  surface  with  uncertainty.  In  addition,  Build  0  will  include 
support  for  exporting  grid  data  to  the  Open  Navigation  Surface  BAG  file  format.  Build  1  is  focused  on 
fusion  of  gridded  historic  data.  In  Build  1 ,  Build  0  will  be  coupled  with  variogram  and  kriging  routines  to 
provide  the  historic  bathymetry  fusion  capability.  Build  2  adds  the  capability  to  estimate  depth  errors 
induced  by  positional  errors  from  in  situ  data. 

As  the  fusion  algorithm  develops,  NRL  can  perform  validation  using  the  following  data. 

1.  DBDB-V  grids  of  varying  resolutions  used  in  the  OAML  Feathering  Algorithm  report  [3]. 
Uncertainties  can  be  provided  from  soundings  collected  in  these  areas  via  CUBE. 

2.  Data  collected  as  a  part  of  the  FY04-FY06  AN/AQS-20A  RTP  effort  [48].  These  data  were 
collected  off  Panama  City,  FL.  They  include  five  sets  of  multibeam  data  from  AN/AQS-20A 
flights,  high-density  multibeam  data  from  dedicated  NAVOCEANO  bathymetry  surveys  in  2002 
and  2006,  historic  NGDC  single  beam  soundings,  and  NOAA  gridded  data  that  is  now  in  DBDB- 
V.  Uncertainties  have  been  calculated  for  these  data.  Estimates  of  uncertainty  for  the  historic  data 
will  be  needed. 

3.  Scripps  Canyon  data  provided  by  Plant  for  validation  of  his  interpolation  technique  [24],  These 
data  have  uncertainty  estimates. 

4.  A  NAVOCEANO  data  set  or  collection  of  data  sets  where  we  would  test  the  algorithms  using  all 
of  the  lines  and  then  every  other  line  to  see  how  the  results  differ,  particular  after  applying  the 
interpolation  and  then  kriging  as  described  by  Calder’s  2006  paper.  Error  estimates  can  be 
provided  via  CUBE  processing. 

5.  SUMMARY 

This  report  reviews  the  current  state-of-the-art  for  fusion  algorithms  applicable  to  bathymetry  data 
and  motivated  by  NAVOCEANO’s  need  for  intelligent  fusion  algorithms.  Currently,  NAVOCEANO’s 
DBDB-V  incoiporates  the  OAML  Feathering  Algorithm  to  smooth  discontinuities  that  occur  between 
tiles  in  the  database  with  dissimilar  resolutions.  This  solution  still  leaves  artifacts  and  does  not  provide 
estimates  of  uncertainty.  NAVOCEANO  needs  a  better  interpolation  technique  that  also  provides 
uncertainty  estimates  for  robust  interpolation  and  to  use  one  database  for  multiple  proposes,  including 
navigation. 

State-of-the-art  interpolation  techniques  include  splines-in-tension,  loess  interpolation,  and  kriging. 
Splines-in-tension  solves  a  differential  equation  numerically  to  obtain  an  interpolated  surface.  This 
technique  is  the  basis  of  the  OAML  Feathering  Algorithm.  It  does  not  provide  error  estimation  readily, 
although  a  Monte  Carlo  technique  can  be  used  to  provide  estimates.  Loess  interpolation  is  a  localized 
weighted  regression  technique  to  fit  data  to  a  low-order  polynomial.  It  provides  quantified  uncertainty 
estimates.  The  technique  is  not  exact  and  acts  as  a  low-pass  spatial  filter,  so  bias  in  the  interpolated 
surface  occurs  and  finer  details  are  smoothed.  Kriging  is  a  methodology  for  providing  interpolation  by 
solving  a  system  of  simultaneous  equations.  It  also  provides  quantified  uncertainty  estimates  and  is  exact 
at  the  observation  points.  The  technique  depends  on  the  model  or  the  choice  of  variogram  to  be  reliable 
and  the  matrix  inversion  is  computationally  intensive.  This  report  also  briefly  reviews  Bayesian 
approaches  that  are  being  developed  in  the  academic  community.  These  approaches  may  provide  future 
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fusion  methodologies,  but  need  to  be  demonstrated  with  bathymetry  data  first  at  the  research  and 
development  level. 

The  recommended  approach  for  intelligent  data  fusion  was  published  and  validated  for  bathymetry 
data  recently  by  Calder  [8].  His  approach  uses  both  loess  interpolation  to  obtain  a  trend  surface,  followed 
by  kriging  of  residuals  to  recapture  finer  details  lost  by  smoothing.  In  addition,  if  in  situ  soundings  are 
used,  Monte  Carlo  simulations  are  run  to  estimate  depth  error  induced  by  position  errors.  The  technique 
provides  the  means  to  also  liberally  estimate  errors  for  navigation  safety  through  the  use  of  a 
hydrographic  comfort  variogram. 
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